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Abstract 

Precise localisation and characterization of active regions and coronal holes as observed by 
EUV imagers are crucial for a wide range of solar and helio-physics studies. We describe a 
segmentation procedure, the SPOCA-suite, that produces catalogs of Active Regions and Coronal 
Holes on SDO-AIA images. The method builds upon our previous work on 'Spatial Possibilistic 
Clustering Algorithm' (SPOCA) and substantially improve it in several ways. The SPOCA-suite 
is applied in near real time on AIA archive and produces entries into the AR and CH catalogs 
of the Heliophysics Event Knowledgebase (HEK) every four hours. We give an illustration of the 
use of SPOCA for determination of the CH filling factors. This reports is intended as a reference 
guide for the users of SPoCA output. 
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1 Introduction 

Accurate determination of Active Regions and Coronal Holes properties on coronal images is im- 
portant for a wide range of applications. As regions of locally increased magnetic flux, the active 
regions (AR) are the main source of solar flares. A catalog of ARs describing key parameters such 
as their location, shape, area, mean and integrated intensity would allow for example to relate those 
properties to the occurence of flares. Having a bounding box of ARs can prove useful when study- 
ing several thousands of ARs together e.g. to make statistical analysis such as oscillations of loops. 
Precise localisation of coronal holes on the other hand is important because of the strong association 



between coronal holes and high-speed solar wind streams ( |Krieger et al.||1973| ). Finally, solar EUV 
flux plays a major role in Solar-Terrestrial relationships, and hence an accurate monitoring of coronal 
holes (CH), quiet sun (QS) and active regions (AR) is desirable as input into solar EUV flux models. 

In this paper, we present the SPOCA-suite, a set of algorithms that allows separating and ex- 
tracting CH, AR, and QS on EUV images. We detail the parameterization needed to treat SDO-AIA 
images. The SPOCA-suite is applied in near real time on AIA archive and produces entries into the 
AR and CH catalogs of the Heliophysics Event Knowledgebase every four hours. The HEK is further 
linked through an API to the graphical interface called /soZrearc/jQ to the ontology software package 
of Solarsoft and to the JHelioviewer visualization tool|^] The code is written is C++, with some 
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wrapper in IDL and Python. It is available upon request to the corresponding author. Figure [T] shows 
a screenshot from the ESA JHelioviewer tool with CH and AR boundaries overlays. 

This paper is intended as the reference guide for the users of the SPOCA output. It presents 
the improvements made with respect to the the previous work on the Spatial Possibilistic Clustering 
Algorithm (Barra et al. 2009| ). At the core of the SPOCA-suite lies a multi-channel unsupervised 
fuzzy clustering method that segments EUV images into different regions according to their intensity 
level. 

Development of automated solar feature detection and identification methods has increased dra- 
matically in recent years due to the growing volume of data available. An overview of the funda- 
mental image-processing techniques used in these algorithms is presented in Aschwanden (20101. 
These techniques are tailored to detect features in various types of observations at different heights in 
the solar atmosphere, see for example ( Martens et al.|2012 Perez-Suarez et al.]|2011 1. For example, 
regions of locally intense magnetic flux are observed as dark spots (sunspot) in white light, Call, or 
continuum images, as an interlace of positive and negative magnetic field value (magnetic AR) in 
magnetograms, and as patches of enhanced brightness in chromospheric (plages) and coronal images 
(AR). 

Image segmentation methods are typically classified into three broad categories: region-based 
methods, edge-based methods and hybrid approaches. Region-based methods seek a partition of 
the image satisfying an homogeneity criterion (on mono-,multispectral gray levels or higher level 
attributes such as texture or feature vectors modeling pixels and their neighborhood). The dual edge- 
based approaches aim at characterising image discontinuities, and thus locating region boundaries. 
Primal edge-based methods seek for maximum of intensity gradients, using either spatial or frequency 
filters, or zeros in the Laplacian of the image, often pre processed by a low pass Gaussian filtering, 
due to the Laplacian sensitivity to noise. Finally, the hybrid methods either consider a cooperation 
between region and contour approaches, or process some other original method. 



Region-based methods for the detection of sunspots include thresholding against background ( Pet- 



tauer & Brandt|1997||Colak & Qahwaji|2008[ ), histogram-based thresholding (Steinegge r et al.|1997] >, 



region-growing method ( [Preminger et al.|1997| ), or bayesian approach ( [Turmon et al.|2002| >. ( |Colak 



& Qahwaji|2008[ [Nguyen et al.|2005] ) combine thresholding and machine learning techniques to ex- 



tract and classify sunspots according to the Mcintosh system. ( |Curto et al.|2008 Watson et al.|2009] > 
both use an edge-based approach based on mathematical morphology. |Zharkov et al.|(|2004[l employ a 



hybrid method that combines edge -based and thresholding approaches, whereas Lefebvre & Rozelot 



( |2004| ) presents a singular spectrum analysis to detect sunspots and faculae at the solar limb. 

Active regions as observed by magnetograms can be extracted and characterized by means of 
region growing techniques (Benkhal iI||2003[|McAteer et al.|2005||Higgins et al.|201lj ), thresholding 
in the intensity (Qahwa jl & Colak|2006[ Colak & Qa hwaji|2009 1 or wavelet domain (Kestener et al 



2010 1. Verbeeck et al. (201 1)) provides a detailed comparaison of outputs from sunspots, magnetic, 



and coronal active regions using one month of SOHO-EIT data. At the chromospheric level, network 



and plage regions are separated using thresholding methods (Steinegger et al. 1998; Worden et al. 



1999[ ), possibly combined with region-growing techniques (Benkhalil et al. 2006). Coronal active 



regions are segmented using either local thresholding and region-growing method (Benkhalil et al. 
2006) , supervised ( |Dudok de Wit|2006[ [Colak & Qahwaji|2011| ), or unsupervised ( |Barra et al.|2009| ) 



classification. |Revathy et al.| ( |2005| ) compares segmentation results of pixelwise fractal dimension of 
EIT images using thresholding, region-growing technique, and supervised classification. 

Coronal holes are regions of lower electron density and temperature than the typical quiet Sun, 
and appear thus as dark regions in EUV and X-ray images. However, automated detection of coronal 



holes based on intensity thresholding in one wavelength (EIT 28.4nm wavelength in (Abramenko 



et al.|2009t|Obridko et al.|2009[ ) or Soft X-ray images in ( |Vrsnak et al.|2007[[Verbanac et al.|2"0TT] )) 
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is intrinsically complicated due to the presence of filaments and transient dimmings of same intensity 
level. To resolve this ambiguity, it is necessary to make use of additional information coming from 
magnetograms, from other wavelengths, or from time evolution of the feature in order to check the 



consistency of a CH candidate with actual physical parameters. For example, Henney & Harvey 



(2005) first use a fixed thresholding based on two-day average He I 1083 nm spectroheliograms 



and thereafter check the unipolarity of the CH candidates using photospheric magnetograms. de 



Toma & Arge| ( 2005] ) use a combination of fixed thresholdings on multiple wavelengths (the four 



SOHO-EIT bandpasses, He I 1083, magetograms, and Ha images) to determine stringent criteria for 
a region to belong to a CH, whereas de Toma ( 2011[ ) use a similar technique on synoptic maps. The 
approach in ( Scholl & Habbal|200 8 ) is to first perform histogram equalization and fixed thresholding 
to extract low intensity regions on the four bandpasses of SOHO-EIT. In a second stage statistics on 
magnetic field parameters measured by SOHO-MDI are evaluated to discriminate between filaments 
and coronal holes. A similar methodology is used in Krista & Gallagher (2009) with the difference 
that it first detects low intensity regions using local histogram of SOHO-EIT, STEREO-EUVI, and 
Hinode-XRT images. Other methods include a watershed approach (Nieniewski 2002 1, perimeter 
tracing for polar coronal hole using morphological transform and thresholding ( Kirk et al.|2009 ), and 
the use of imaging spectroscopy to separate quiet sun from coronal hole emission (Malanushenko & 



Jones 2005[ ). Finally, the classification approach in (Dudok de Wit 2006 Colak & Qahwaji||2011 



Barra et al.|2009 ) allows separating both active regions and coronal holes using brightness intensity 



observed in one or in multiple bandpasses. 

In this paper, we describe the improvements made with respect to the paper in ( Barra et al.|2 009) 
in terms of robustness and stability. For example, the segmentation in Barra et al. (2009]) required 
the pre-determination of centers of classes before applying it on a large data set, which is difficult to 
implement when the segmentation is performed continuously on a stream of data. 

As part of the Feature Finding Team (Marte mTet al.|20lT ), programs have been specifically written 
to run in near real time on SDO-AIA images. It results in two separate modules: SPOCA-AR module 
for Active Regions, and SPOCA-CH for coronal. 

Section [2]below describes the algorithm, and Section [3]provides some illustration of the algorithm 
on SDO-AIA data. 



2 The SPOCA-suite 

The SPOCA-suite implements three types of fuzzy clustering algorithms: the Fuzzy C-means (FCM), 
a regularized version of FCM called Possiblistic C-means (PCM) algorithm, and a Spatial Possiblistic 
Clustering Algorithm (SPOCA) that integrates neighboring intensity values. 

The description of the segmentation process in terms of fuzzy logic was motivated by the facts 
that information provided by an EUV solar image is noisy and subject to both observational biases 
(line-of-sight integration of a transparent volume) and interpretation (the apparent boundary between 
regions is a matter of convention). Fuzzy measures are able to represent ill-defined classes (without a 
clear-cut boundary) in a natural way. 

Let xj e f be a /7-dimensional feature vector that describes the Sun at a particular location. 
In our case, xj is a Af-dimensional vector of intensities recorded in p different channels. A fuzzy 
clustering algorithm searches for C compact clusters amongst the x/s in X = {xj). It does so by 
computing both a fuzzy partition matrix U = (w,y), 1 < i < C, 1 < j < N and the cluster centers 
B - (bj € W\ 1 < i < C). The scalar mj = m{xj) € [0, 1] is called the membership degree of xj to 
class i. 

We now describe the FCM and PCM algorithm as they are used in the SPOCA-suite. A description 
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of the Spatial Possibilistic clustering algorithm is available in 



Barra et al. 



(2009). 



2.1 Fuzzy Clustering Algorithm 



Since its introduction by Bezdek (Bezdek| [1981 ), the Fuzzy C-Means (FCM) algorithm was widely 
used in pattern recognition and image segmentation, in various fields including medical imaging 
(Philipps et al. 1995[ Bezdek et al. 1997[ ), remote sensing images (Rangsanseri & Thitimajshima 
|1998[ [Melgani et al 2000 1, color image segmentation in vision (Ba ker et al.||2003| . As compared 
to the crisp (traditional) method, the fuzzy segmentation more often reaches a global optimal, rather 
than merely a local optimum (Trauwaer t et al.|1991 1. 

The idea behind FCM is the minimisation of the total intracluster variance: 



subject to 



C N 

J FCM (B, U,X) = YjTj u ?j d2 ( x P M 

i=l 7=1 



N C 

(Vie{l.-.C})£"y<# and (Vj e {1 • • ««/ = 1 



(1) 



(2) 



where d is a metric in IR P (typically, the Euclidian distance), and m is a parameter that controls the 
degree of fuzzification (m = 1 means no fuzziness). In practice, a value of m = 2 is often chosen, 
as it allows for a fast computation in the iterative scheme. In our application we consider either one 
(p = 1) or two channels (p = 2). 

The minimization of ([T]) is reached when 
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Uii = 



( d 2 {Xj,b,) 

=i \dHxj,b k ) 



l/(m-l) 



and b: = 



f^=l u ik Xk 



(3) 



i u ik 



The values satisfying Q are obtained through an iterative procedure. 

The shortcomings of FCM are of two types. First, it is sensitive to noise and outliers (Krishna- 
puram & Keller||1993 1. Second, it is theoretically not satisfying since the value of one center in ([3]) 



depends on the value of the other centers. 

In practice however FCM provides the best results for extracting Coronal Holes out of the (almost 
noise free) 19.3nm AIA images. Given the size of AIA images, FCM is applied on the histogram of 
normalized intensity values, rather than on individual values. The normalization consists in dividing 
by exposure time, correcting for limb brightness enhancement, and dividing by the median value 



(see Section 2.5 1. A binsize of 0.01 for the histogram provides the same numerical precision as if 
individual values were used. 



2.2 Possibilistic C-means algorithm 



To obtain a formulation for uij that depends only on the distance to the center of class /, Krishnapuram 



& Keller ( 1993 1996) propose to minimize the objective function 

c f N N 
J PCM (B, U,X) = Y, Yj <j d2( - x J> b i) + 2 (1 _ Ui J )n 



i=\ \j=\ 



7=1 



subject to 



(V/ € {1 • • • C}) V <N and (Vj € { 1 • • • N)) max u i} > 

7=1 



(4) 



(5) 
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The first term of Jpcm in (|4]> is the intracluster variance, whereas the second term, stemming from the 
relaxation of the probabilistic constraint in Q, enforces to depend only on d(xj, bt). 

Parameter 77; in (|4]) is homogeneous to a squared distance. It can be fixed, or updated at each 



iterations. Krishnapuram & Keller ( 1993 ) proposed to compute 77, as the intra-class dispersion: 



(6) 



7=1 



The solution of the minimization of ((4]) satisfies: 



ua = 



\+(d\xj,bi)lri) 1 



N 



(V) 



(8) 



./=! 



.7=1 



In practice, PCM is initialized by one iteration of FCM, which allows for computation of 77, as 
in Q. Krishnapuram & Keller (1993]) proved the convergence of iteration (|7])-((8]) for fixed 77. PCM 
is more robust to noise and outliers than FCM, and provides independent functions Ui = \u\j, j = 
1, . . . ,N}. It must be corrected however from coincident clustering (Section 2.2.1 1, and a proper 



choice of the parameter 77 must be made (Section 2.2.2 ). The SPOCA-AR module of the HEK uses 
this collected PCM algorithm on the AIA 17. lnm and 19.3nm bandpasses. Similar to the SPOCA-CH 
module, it is applied on histogram intensity values rather than on individual pixel intensity values. 



2.2.1 Coincident Clustering 

The original PCM suffers from convergence to a unique center. This is a typical feature of possibilistic 
clustering algorithms called 'coincident clustering' (Krishnapuram & Keller||1996] >. To circumvent 
this problem we use membership functions ity which are more compact and hence do not overlap so 
easily. More precisely, exponent 1 f(m - 1) in ([7]) is replaced by 2/(m - 1), see Figure[2]for a graphical 
representation. We call PCM2 the algorithm where the exponent in the membership function uij is 
taken equal to 2/(m - 1). 



2.2.2 Constraints on the parameter 77, 

The dynamical range of intensities differs amongst the CH, QS, and AR classes, with active regions 
having the largest spread in intensities. The parameter 77, as computed in Q can be viewed as a 
measure of dispersion or variance within a class. In case ijar becomes prohibitively large, the value 
of uarj for dark pixels xj can be higher than uqsj or uchj, as illustrated in Figure[3ja). To avoid this 
situation we must enforce the following inequalities, as derived in Appendix [A] 

m < ^ t n*?. < t^± t m < ^± t fy [q = i,. m . tP . (9 ) 

riCH b C H,q ricH b C H,q 1]QS b QS ,q 

with bcH,q,bQs, q , and ^ar,? the center of classes for the q-th channel and for CH, QS and AR re- 
spectively. Figure [3] shows an example of segmentation with and without constraints on 77,. Without 
constraints, some CH areas gets classified in the same class as the AR. This problem is solved when 
constraints Q are introduced. 
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For solar EUV images, the combination of the iteration scheme (|6])-([8]) tends to produce t]ch- 
values that converge to zero. Due to the condition ([9]) on tjqs and t]ar, these two parameters are 
dragged along to converge to zero as well. Our iterative scheme therefore freezes the value of m when 
it has changed by a factor a with respect to its starting value. In other words, formula ([6]) is used until 
iteration it is reached with: 

rfllri] >aOTT}j/r]f > a . 

For the next iterations, we keep rff fixed. Satisfactory results have been obtained on a variety of 
datasets and instruments with a = 100. 



2.3 Smooth variation of center values 

In order to have a smooth variation of the center values over time, the centers chosen at time t in the 
HEK are in fact the median of the last 10 centers obtained at previous time stamp t, t — I, ... ,t — 9. 
In order to get the membership map corresponding to this smooth value of the center b{ an attribution 
procedure using left part of formula Q (for FCM), or formula (|7])(for PCM) is performed. 



2.4 Segmented maps 

Given the membership maps U and centers B a segmention map can be obtained using various deci- 
sion rules 

Maximum: The most common rule is to attribute a pixel j to the class c for which it has the maximum 
membership value: {u c j = max,-^...^} This rule is used in the SPOCA-CH module. 

Threshold On EUV images the QS class contains typically the largest number of pixels, and hence 
the center of QS class is the most stable over time. On the contrary, the cardinality of points 
belonging to the AR class varies a lot over time, resulting in a high variation of center of 
AR class. In order to have a stable segmentation of AR over time, the SPOCA-AR module 
therefore uses a threshold on the QS membership class to define the belonginess of a pixel to 
the AR class. All pixels j which are higher than bQs and for which uqsj is lower than 0.0001 
are attributed to the AR class. 

Closest This rule attributes a pixel to the class for which the Euclidian distance to the center class is 
the smallest. 

Merge This more complex procedure uses sur-segmentation and merging of classes and has been 
described in (Bar ra et al.|20 09). 



2.5 Pre-processing 

Some pre-processing steps are needed in order to obtain an accurate segmentation of the images. 

First, images are calibrated using solar software routines and intensities are divided by the expo- 
sure time. 

Second, the SPOCA-suite might be applied on linear or on square root transformed images. A 
square root transform is akin to an Anscombe transform ( Anscombe| [T948 ) which has the property 



of approximately converting Poisson noise into Gaussian noise. This is especially useful for the 
extraction of low-intensity regions such as coronal holes which are affected by Poisson noise. 

Third, the limb brightening effect has to be corrected before any segmentation based on intensity 



can be applied. A first correction was proposed in (|Barra et aL 2009). It consists of applying a a 



polar transform to represent the image / in a (p, 6) plane, with origin at the solar disc center. The 
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polar transform is a conformal mapping from points in the cartesian plane (x, y) to points in this polar 
plane, described by: p - ^x 2 + y 2 , 9 = atan(y/x). The intensity is specified as a function of p using 

r2.K 

the integral g(p) = J Q I(p, 6)d9. Denoting m Q the median value of intensities on the on-disc part of 
the Sun, the image I corr corrected for the enhanced brightness at the limb is computed as: 

Hp, 8) 

Icorr(p,8) = me , (10) 

g(p) 

for values of p ranging between 95% and 105% R Q . I cor r is then finally remapped to the cartesian 
plane. 

Such abrupt correction leads to disconintuities in the images around these radial distances, as can 
be seen on the images in Figure [3] We propose instead to apply a correction I smooth'- 

I S mooth(x,y) - (1 - f(p(x,y))) I{x,y) + f(p(x,y))I corr (x,y). (1 1) 

where f(p(x, y)) introduces a smooth transition between the zones not corrected (when p is between 



and r\ or above r$) and the zones p € [r 2 , r3] that are fully corrected using equation ( 10): 



fip) = 



if p < r\ or p > 

1 if p € [r 2 , r 3 ] 
Isin(^(p-^)) + i if n<p<r 2 

J^U?*<P+ + 2 if n<p< U 



A simulation study was conducted to determine the optimal values of [r\, r 2 , 73, for a given in- 
strument. The mean square error (MSE) between corrected limb values and the on-disc median 
intensity is used as criterion. For SDO-AIA, the values that minimizes the MSE are (in % of R Q ) are 
[70,95, 108, 112] . Fnally, I smooth is divided by its median value. 

2.6 Region extraction and post-processing 

To extract regions (that will be labelled as Active Regions or as Coronal Holes) from segmented maps, 
the following post-processing steps are implemented: 



1. Compute a sinusoidal projection map ( |Snyderfl987 1. This improves the determination of re- 
gions towards the limb. 

2. Clean the segmented map by removing elements smaller than 6 arcsec using a morphological 
erosion 

3. Aggregate neighboring blobs by performing a morphological closing, which consists of a di- 
latation by 32 arcsec followed by an erosion. 

4. Compute the inverse of sinusoidal projection 



The reader is refered to Barra et al. ( 2009| ) for an introduction to mathematical morphology in 



this context. The equirectangular and Lambert cylindrical projections are also implemented in the 
SPOCA-suite. Our test on SDO data shows that the sinusoidal projection gives the best results. 

To remove spurious regions, a final cleaning is performed as follows. Active regions smaller than 
1500 arcsec 2 and coronal holes smaller than 3000arcsec 2 are discarded. 

Moment statistics and properties such as area, barycenter location, bounding box are then com- 
putecQ 



The list of all features computed can be found in http : //www . lmsal . com/hek/VOEvent_Spec . html 



7 



A coronal hole has relatively smooth boundaries. Having an accurate estimation of its shape and 
localisation is crucial for space weather purpose, since coronal holes are most geoeffective when they 
are located at the central meridian and near the equator. We computed a chain code for the coronal 
holes using per default a maximum of 100 points to describe the contour. The details of the algorithm 
are described in Appendix[B] Similarly, we provided a chain-code for Active Regions. Note however 
that in order to represent in an optimal way the non-smooth AR boundaries more than 100 points 
would be needed. 

2.7 Tracking over time 

Following the aggregation of regions described in the previous section, an active region is defined as 
a coherent group of corresponding active region blobs, and similarly for the coronal holes. The goal 
of tracking is to appoint the same ID number to a physical region (AR or CH) over time. A region 
observed at timestamp t can follow the region observed at previous timestamp, but it can also split 
(and produce two children), or it can merge with other neighboring regions. 

Our tracking scheme amounts to create a directed graph (N, E) where N is the list of nodes rep- 
resenting individual regions and E is the list of edges between regions. An edge between a region 
observed at time t\ and another observed at time t% is created if their time difference tj - t\ is smaller 
than some value, if they overlay, and if there is not already a path between them. It is possible to first 
derotate the regions maps before comparing them. This is necessary when large time difference are 
involved. 

Coronal holes are long-lived features. We can include this information in the tracking and keep 
only coronal holes that are older than three days. Our analysis of SDO images during the year 2011 
shows that a coronal hole candidate detected for more than three consecutive days exhibits the ex- 
pected magnetic properties characteristics of unipolar regions. Hence we report to the HEK only the 
CHs that are older than three days. 

3 Analysis of SDO-AIA images 

We now illustrate the results from SPoCA using SDO images. Figure|4]shows an example of overlays 
of AR and CH maps onto the corresponding AIA images. Figure [5] shows an example of statistics 
given by the SPOCA algorithm, for the period June 2010 till October 201 1. 

4 Conclusion 

We describe here the algorithm implemented int eh SPOCA-AR and SPOCA-CH modules of the 
HEK. This provides catalogs or Active Regions and Coronal Holes containing properties such as 
localisation (through a bounding box or a chain code), area, moments of intensities. Contours can 
also be visualized through isolsearch interface or within the JHelioviewer visualization software. 

These catalogs have many different applications, one of them being the determination of filling 
factors for CH, AR, QS, to be included into (semi-)empirical models of irradiance. 

In future reseach, we plan on improving criteria for distinguishing between filaments and coronal 
holes. At present the distinction is based on area (we retain only CH larger than 3000 arcsec 2 ) and 
duration (we retain only CH older than 3 days), as our study on the year 2010-201 1 shows this extract 
CH candidates having appropriate magnetic properties. As we reach solar maxima however, such 
distinction between filaments channel and coronal holes might become more problematic. 
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A Constraints on regularization parameter 



All algorithms based on Possibilistic C-means strongly depends on the choice of the regularization 
parameter 77. Various more or less elaborated procedures have been proposed in the literature, see 



for example Krishnapuram & Keller (T996J). An intuitive choice is to compute 77, as the intra-class 



dispersion, as in equation ([6]). Problems arise however when the underlying classes have a widely 
different intra-class variance. For example, the highly variable AR class on EUV images may include 
in the final segmentation the darkest part of what should be the Coronal Hole class. 

To understand this phenomena, consider two classes C\ and C2 with centers b\ and £2- , Suppose 
b\ p < b2 P , Vp — 1, . . . , P, where P is the dimension of the feature space. We show that under certain 
circumstances, U2j can exceed u\j for values of xj where xj p < b\ p , Sp = 1, . . . , P. 

Let us first determine the locus of points xj where u\j = U2j- By equation (|7]) we found that 
u\j - U2j if and only if 

d 2 (xi,b\) d 2 (xj,b2) 

3 = 3 . (12) 

n\ m 

If 771 = 772, the solution is a hyperplane through the middle of b\ and b2, and there are no undesired 
effects. In case 771 + 772, and for the Euclidean distance, the above is equivalent to saying that xj lies 
on a circle with center 

r] 2 bi - rjib 2 



and radius 



(13) 



. d{b u b 2 ). (14) 

\m -m\ 

We consider only the case 771 < 772, as this is the case for the classes CH, QS and AR as they arise in 
EUV images. In this case, c will lie relatively close to b\, at the "opposite side" from £2- All points 
Xj inside the circle will satisfy u\j > U2j, and hence belong to class 1. All points xj outside the circle 
will satisfy U2j > u\j, and hence will be classified as belonging to class 2. This is unwanted behavior 
for those points for which some Xj p < b\ p . In order to avoid this situation, we can select 77- values in 
such a way that the circle center c lies below all coordinate axes. The result is that for all points Xj in 
the circle, all positive points "below" it are also in the circle. Hence whenever a point xj belongs to 

class 1, all points "below" Xj also belong to class 1. So we want c p < 0, Vp = 1 dim, which is 

equivalent to the following conditions on 771 and 772: 

— < ^,Vp = l,...,dim (15) 
m b lp 

B Computation of chain code 

Chain coding aims at representing the boundary of an object in digitized images. It is based on the 
idea of following the outer edge of the object and storing the direction when travelling along the 
boundary ( |Freeman|1961 ). In the SPOCA-suite, we use a representation of the chain code with eight 



directions, as is commonly done in the literature. Such represenation is however of the same length 
as the perimeter of the object under consideration, which in many cases is too long. In a second step, 
we thus find a polygonal approximation to the perimeter that has a maximal number of edges, and for 
which the distance from any point in the perimeter to the polygon does not exceed a given accuracy. 

The algorithm proceeds as a recursive refinement. The main axis of the contour is first extracted, 
providing the first two vertices. Each polygon edge is then recursively split by introducing a new ver- 
tex at the most distant associated contour point, until the desired accuracy is reached. More precisely, 
the algorithm runs as follows: 
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1. Initialize the polygon with points pi and p2 of the perimeter that are furthest away from each 
other. 

2. Let i = 3 

3. For each segment in the polygon find the point on the perimeter that have the furthest distance 
to the polygonal line-segment. If this distance is larger than a threshold mark the point with a 
label pi 

4. Renumber the points so that they are consecutive 

5. Increase i 

6. If no points have been added then finish, otherwise go back to step 3. 

Within the HEK, a maximal number of 100 points is considered to be sufficient for a reasonable 
approximation of CH boundaries. 
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In ESAJHelioviewer 
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Figure 1 : Screenshot from the ES A JHelioviewer tool. The picture on the right displays the AIA 
171A image taken on 12 February 2012 at 9:02:12 together with Active Region and Coronal Hole 
location and chain-code information that are recorded in the HEK. An Event Information window 
pops up when clicking on an event or feature (here the large Coronal Hole located in the South 
hemisphere) 
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Figure 2: Comparaison of membership functions Uij when the exponent is chosen equal to 1 /(m - 1) 
(red line) and 2/(m - 1) (blue dashed line). The blue dashed line is more compact, which will lead to 
the choice of distinct centres of classes. 
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(c) Original segmented map (d) Segmented map with constraints 



Figure 3: (a) Illustration of membership functions uu for QS and AR descriptors xj. Because of the 
larger spread in intensity values of the AR, small values of Xj may have a larger AR membership than 
QS membership, (b) EIT 195 A from January 1, 2000 with correction of limb brightness (c) Segmen- 
tation using original PCM algorithm: darkest part are classified as Active regions (d) Segmentation 
using PCM with constraints on ^-values 
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HPCM2 DivExpTime,ALC,DivMedian 
cleaning: 6(a/s) 
aggregation: 32(a/s) 
projection: sinuosidal 
min size: 1500(a/s) 2 



HFCM DivExpTime,ALC,ThrMax80,TakeSqrt 
cleaning: 6{a/s) 
aggregation: 32(a/s) 
projection: sinuosidal 
min size: 3000(a/s) a 




AIA 17lA2011-06-22T15:00:01.34 



(a) Overlay of AR map on AIA 171 A 



(b) Overlay of CH map on AIA 193 A 



Figure 4: (a) Overlay of a AR map created using AIA 171 and 193 A onto the corresponding AIA 
171A image taken on 201 10622 at 15:00 (b) Overlay of CH map created using AIA 193 A onto the 
corresponding AIA 193 A image taken on 20100517 at 15:00 
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Figure 5: Coronal Holes Filling factor extracted from SPOCA results, from 1st June 2010 till 31 
October 2011. 
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